87 research outputs found

    DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks

    Full text link
    Non-contact video-based physiological measurement has many applications in health care and human-computer interaction. Practical applications require measurements to be accurate even in the presence of large head rotations. We propose the first end-to-end system for video-based measurement of heart and breathing rate using a deep convolutional network. The system features a new motion representation based on a skin reflection model and a new attention mechanism using appearance information to guide motion estimation, both of which enable robust measurement under heterogeneous lighting and major motions. Our approach significantly outperforms all current state-of-the-art methods on both RGB and infrared video datasets. Furthermore, it allows spatial-temporal distributions of physiological signals to be visualized via the attention mechanism.Comment: Accepted paper at ECCV 2018. 16 pages, 3 figures, supplementary materials in the ancillary file

    Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards

    Full text link
    As people learn to navigate the world, autonomic nervous system (e.g., "fight or flight") responses provide intrinsic feedback about the potential consequence of action choices (e.g., becoming nervous when close to a cliff edge or driving fast around a bend.) Physiological changes are correlated with these biological preparations to protect one-self from danger. We present a novel approach to reinforcement learning that leverages a task-independent intrinsic reward function trained on peripheral pulse measurements that are correlated with human autonomic nervous system responses. Our hypothesis is that such reward functions can circumvent the challenges associated with sparse and skewed rewards in reinforcement learning settings and can help improve sample efficiency. We test this in a simulated driving environment and show that it can increase the speed of learning and reduce the number of collisions during the learning stage

    Identifying Bias in AI using Simulation

    Full text link
    Machine learned models exhibit bias, often because the datasets used to train them are biased. This presents a serious problem for the deployment of such technology, as the resulting models might perform poorly on populations that are minorities within the training set and ultimately present higher risks to them. We propose to use high-fidelity computer simulations to interrogate and diagnose biases within ML classifiers. We present a framework that leverages Bayesian parameter search to efficiently characterize the high dimensional feature space and more quickly identify weakness in performance. We apply our approach to an example domain, face detection, and show that it can be used to help identify demographic biases in commercial face application programming interfaces (APIs)

    DeepMag: Source Specific Motion Magnification Using Gradient Ascent

    Full text link
    Many important physical phenomena involve subtle signals that are difficult to observe with the unaided eye, yet visualizing them can be very informative. Current motion magnification techniques can reveal these small temporal variations in video, but require precise prior knowledge about the target signal, and cannot deal with interference motions at a similar frequency. We present DeepMag an end-to-end deep neural video-processing framework based on gradient ascent that enables automated magnification of subtle color and motion signals from a specific source, even in the presence of large motions of various velocities. While the approach is generalizable, the advantages of DeepMag are highlighted via the task of video-based physiological visualization. Through systematic quantitative and qualitative evaluation of the approach on videos with different levels of head motion, we compare the magnification of pulse and respiration to existing state-of-the-art methods. Our method produces magnified videos with substantially fewer artifacts and blurring whilst magnifying the physiological changes by a similar degree.Comment: 24 pages, 13 figure

    A Scalable Approach for Facial Action Unit Classifier Training UsingNoisy Data for Pre-Training

    Full text link
    Machine learning systems are being used to automate many types of laborious labeling tasks. Facial actioncoding is an example of such a labeling task that requires copious amounts of time and a beyond average level of human domain expertise. In recent years, the use of end-to-end deep neural networks has led to significant improvements in action unit recognition performance and many network architectures have been proposed. Do the more complex deep neural network(DNN) architectures perform sufficiently well to justify the additional complexity? We show that pre-training on a large diverse set of noisy data can result in even a simple CNN model improving over the current state-of-the-art DNN architectures.The average F1-score achieved with our proposed method on the DISFA dataset is 0.60, compared to a previous state-of-the-art of 0.57. Additionally, we show how the number of subjects and number of images used for pre-training impacts the model performance. The approach that we have outlined is open-source, highly scalable, and not dependent on the model architecture. We release the code and data: https://github.com/facialactionpretrain/facs

    Do Facial Expressions Predict Ad Sharing? A Large-Scale Observational Study

    Full text link
    People often share news and information with their social connections, but why do some advertisements get shared more than others? A large-scale test examines whether facial responses predict sharing. Facial expressions play a key role in emotional expression. Using scalable automated facial coding algorithms, we quantify the facial expressions of thousands of individuals in response to hundreds of advertisements. Results suggest that not all emotions expressed during viewing increase sharing, and that the relationship between emotion and transmission is more complex than mere valence alone. Facial actions linked to positive emotions (i.e., smiles) were associated with increased sharing. But while some actions associated with negative emotion (e.g., lip depressor, associated with sadness) were linked to decreased sharing, others (i.e., nose wrinkles, associated with disgust) were linked to increased sharing. The ability to quickly collect facial responses at scale in peoples' natural environment has important implications for marketers and opens up a range of avenues for further research.Comment: 33 page

    iPhys: An Open Non-Contact Imaging-Based Physiological Measurement Toolbox

    Full text link
    Imaging-based, non-contact measurement of physiology (including imaging photoplethysmography and imaging ballistocardiography) is a growing field of research. There are several strengths of imaging methods that make them attractive. They remove the need for uncomfortable contact sensors and can enable spatial and concomitant measurement from a single sensor. Furthermore, cameras are ubiquitous and often low-cost solutions for sensing. Open source toolboxes help accelerate the progress of research by providing a means to compare new approaches against standard implementations of the state-of-the-art. We present an open source imaging-based physiological measurement toolbox with implementations of many of the most frequently employed computational methods. We hope that this toolbox will contribute to the advancement of non-contact physiological sensing methods

    A Multimodal Emotion Sensing Platform for Building Emotion-Aware Applications

    Full text link
    Humans use a host of signals to infer the emotional state of others. In general, computer systems that leverage signals from multiple modalities will be more robust and accurate in the same task. We present a multimodal affect and context sensing platform. The system is composed of video, audio and application analysis pipelines that leverage ubiquitous sensors (camera and microphone) to log and broadcast emotion data in real-time. The platform is designed to enable easy prototyping of novel computer interfaces that sense, respond and adapt to human emotion. This paper describes the different audio, visual and application processing components and explains how the data is stored and/or broadcast for other applications to consume. We hope that this platform helps advance the state-of-the-art in affective computing by enabling development of novel human-computer interfaces

    Modeling Affect-based Intrinsic Rewards for Exploration and Learning

    Full text link
    Positive affect has been linked to increased interest, curiosity and satisfaction in human learning. In reinforcement learning, extrinsic rewards are often sparse and difficult to define, intrinsically motivated learning can help address these challenges. We argue that positive affect is an important intrinsic reward that effectively helps drive exploration that is useful in gathering experiences. We present a novel approach leveraging a task-independent reward function trained on spontaneous smile behavior that reflects the intrinsic reward of positive affect. To evaluate our approach we trained several downstream computer vision tasks on data collected with our policy and several baseline methods. We show that the policy based on our affective rewards successfully increases the duration of episodes, the area explored and reduces collisions. The impact is the increased speed of learning for several downstream computer vision tasks

    M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

    Full text link
    Generative adversarial networks have led to significant advances in cross-modal/domain translation. However, typically these networks are designed for a specific task (e.g., dialogue generation or image synthesis, but not both). We present a unified model, M3D-GAN, that can translate across a wide range of modalities (e.g., text, image, and speech) and domains (e.g., attributes in images or emotions in speech). Our model consists of modality subnets that convert data from different modalities into unified representations, and a unified computing body where data from different modalities share the same network architecture. We introduce a universal attention module that is jointly trained with the whole network and learns to encode a large range of domain information into a highly structured latent space. We use this to control synthesis in novel ways, such as producing diverse realistic pictures from a sketch or varying the emotion of synthesized speech. We evaluate our approach on extensive benchmark tasks, including image-to-image, text-to-image, image captioning, text-to-speech, speech recognition, and machine translation. Our results show state-of-the-art performance on some of the tasks
    • …
    corecore